Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proctree improvements (RSS/Performance) #4261

Merged
merged 4 commits into from
Aug 23, 2024
Merged

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Aug 22, 2024

1. Explain what the PR does

4b89620 fix: create a new fresh string for comm
cb89e2f perf: use proctree expirable lru cache
976704f perf: remove mutex, add concurrency tests
36ca022 chore: benchmark proctree

4b89620 fix: create a new fresh string for comm

comm had the same inner address as the original string which it was
created from (basename). This caused the original underlying bytes to
be preserved and not garbage collected.

Converting it to a byte slice and then back to a string guarantees that
a new string is created releasing the original reference.

cb89e2f perf: use proctree expirable lru cache

hashicorp LRU increases RSS over time. This change uses the proctree
lru cache to an expirable one, which will remove entries after a
certain time.

Two flags/options were added to the proctree command to control the
cache TTLs. The default is 120 seconds.

flags:

  --proctree process-cache-ttl=60
  --proctree thread-cache-ttl=60

yaml config:

    cache-ttl:
        process: 60
        thread: 60

976704f perf: remove mutex, add concurrency tests

Public proctree methods already make use of the mutex of the inner lru
cache. This commit removes the outer mutex and adds concurrency tests
to ensure that the proctree is safe to use concurrently.

2. Explain how to test it

Run tracee

sudo ./dist/tracee -e execve --proctree source=both --proctree process-cache-ttl=60 --proctree thread-cache-ttl=60 -o none

Stress it

i=0; while ((i < 300000)); do ls >/dev/null; ((i++)) ; done

Measurements

We consider looking at only the 000000c0000000000 segment, narrowing down to where the proctree allocates memory. Not doing this would give us the full RSS which contains the perf buffer segments that are allocated per cpu.

PROCTREE MAIN
-------------

000000c000000000   53248   50932   50932 rw---   [ anon ] - right after 5s of running
000000c000000000  393216  388064  388064 rw---   [ anon ] - right after stress (stress stopped)
000000c000000000  393216  388072  388072 rw---   [ anon ] - +2min after
000000c000000000  393216  310312  310312 rw---   [ anon ] - +2min after
000000c000000000  393216  310056  310056 rw---   [ anon ] - +2min after

*A reduction of 78008 KB (after 6min) but without further decrease.

NEW PROCTREE (using an expirable cache with 60s TTL)
------------

000000c000000000   53248   50864   50864 rw---   [ anon ] - right after 5s of running
000000c000000000  397312  394204  394204 rw---   [ anon ] - right after stress (stress stopped)
000000c000000000  397312   73280   73280 rw---   [ anon ] - +2min after
000000c000000000  397312   49988   49988 rw---   [ anon ] - +2min after
000000c000000000  397312   48580   48580 rw---   [ anon ] - +2min after

*A reduction of 345624 KB (after stress and cache expiring).


976704f perf: remove mutex, add concurrency tests

That reduced lock contention.

/home/gg/.goenv/versions/1.22.4/bin/go test -benchmem -run=^$ -tags ebpf -bench ^BenchmarkProcessTree$ github.com/aquasecurity/tracee/pkg/proctree -benchtime=1000000x -race

| Benchmark                                |   new   |   old   | % Improv |
|                                          | (ns/op) | (ns/op) |          |
|------------------------------------------|---------|---------|----------|
| GetProcessByHash-Concurrency1-32         |  176.1  |  297.5  | 40.81%   |
| GetProcessByHash-Concurrency2-32         |  583.1  | 1513.0  | 61.46%   |
| GetProcessByHash-Concurrency4-32         | 1103.0  | 2404.0  | 54.12%   |
| GetProcessByHash-Concurrency8-32         | 3222.0  | 3605.0  | 10.62%   |
| GetOrCreateProcessByHash-Concurrency1-32 |  298.9  |  333.4  | 10.36%   |
| GetOrCreateProcessByHash-Concurrency2-32 |  759.7  | 1723.0  | 55.90%   |
| GetOrCreateProcessByHash-Concurrency4-32 | 1343.0  | 3133.0  | 57.14%   |
| GetOrCreateProcessByHash-Concurrency8-32 | 3532.0  | 5780.0  | 38.89%   |
| GetThreadByHash-Concurrency1-32          |  291.1  |  372.4  | 21.83%   |
| GetThreadByHash-Concurrency2-32          |  695.9  | 2630.0  | 73.54%   |
| GetThreadByHash-Concurrency4-32          | 1574.0  | 2503.0  | 37.12%   |
| GetThreadByHash-Concurrency8-32          | 3282.0  | 3301.0  |  0.58%   |
| GetOrCreateThreadByHash-Concurrency1-32  |  300.2  |  336.5  | 10.80%   |
| GetOrCreateThreadByHash-Concurrency2-32  |  837.1  | 2537.0  | 67.01%   |
| GetOrCreateThreadByHash-Concurrency4-32  | 1764.0  | 1704.0  | -3.52%   |
| GetOrCreateThreadByHash-Concurrency8-32  | 3946.0  | 5282.0  | 25.28%   |
Measurements Conclusion

With this PR, after stress and cache expiring, we have a decrease around 267616KB (310056KB [old final cache] - 48580KB [new final cache]) in the size of proctree and significant improvement in CPU time of the public methods of proctree.

3. Other comments

Public proctree methods already make use of the mutex of the inner lru
cache. This commit removes the outer mutex and adds concurrency tests
to ensure that the proctree is safe to use concurrently.
hashicorp LRU increases RSS over time. This change uses the proctree
lru cache to an expirable one, which will remove entries after a
certain time.

Two flags/options were added to the proctree command to control the
cache TTLs. The default is 120 seconds.

flags:

  --proctree process-cache-ttl=60
  --proctree thread-cache-ttl=60

yaml config:

    cache-ttl:
        process: 60
        thread: 60
comm had the same inner address as the original string which it was
created from (basename). This caused the original underlying bytes to
be preserved and not garbage collected.

Converting it to a byte slice and then back to a string guarantees that
a new string is created releasing the original reference.
@rscampos
Copy link
Contributor

rscampos commented Aug 23, 2024

double checked the benchmark in my env, good improvement!

GOOS=linux CC=clang GOARCH=arm64 CGO_CFLAGS="-I/vagrant/dist/libbpf -I/vagrant/3rdparty/libbpf/include/uapi" CGO_LDFLAGS="-L/vagrant/dist/libbpf/obj -lbpf " go test -benchmem -run=^$ -bench ^BenchmarkProcessTree$ github.com/aquasecurity/tracee/pkg/proctree -benchtime=1000000x -race

BenchmarkProcessTree/GetProcessByHash-Concurrency1-4             612.0 ns/op  309.2 ns/op  49.45%
BenchmarkProcessTree/GetProcessByHash-Concurrency2-4            1508 ns/op   947.5 ns/op  37.20%
BenchmarkProcessTree/GetProcessByHash-Concurrency4-4            3133 ns/op   1918 ns/op   38.77%
BenchmarkProcessTree/GetProcessByHash-Concurrency8-4            6332 ns/op   3646 ns/op   42.44%
BenchmarkProcessTree/GetOrCreateProcessByHash-Concurrency1-4     667.8 ns/op  491.9 ns/op  26.35%
BenchmarkProcessTree/GetOrCreateProcessByHash-Concurrency2-4    1694 ns/op   1377 ns/op   18.71%
BenchmarkProcessTree/GetOrCreateProcessByHash-Concurrency4-4    3396 ns/op   2707 ns/op   20.29%
BenchmarkProcessTree/GetOrCreateProcessByHash-Concurrency8-4    6842 ns/op   5419 ns/op   20.83%
BenchmarkProcessTree/GetThreadByHash-Concurrency1-4              667.1 ns/op  499.3 ns/op  25.14%
BenchmarkProcessTree/GetThreadByHash-Concurrency2-4             1673 ns/op   1381 ns/op   17.46%
BenchmarkProcessTree/GetThreadByHash-Concurrency4-4             3405 ns/op   2615 ns/op   23.21%
BenchmarkProcessTree/GetThreadByHash-Concurrency8-4             6844 ns/op   5317 ns/op   22.31%
BenchmarkProcessTree/GetOrCreateThreadByHash-Concurrency1-4      697.7 ns/op  522.8 ns/op  25.09%
BenchmarkProcessTree/GetOrCreateThreadByHash-Concurrency2-4     1694 ns/op   1377 ns/op   18.72%
BenchmarkProcessTree/GetOrCreateThreadByHash-Concurrency4-4     3506 ns/op   2860 ns/op   18.44%
BenchmarkProcessTree/GetOrCreateThreadByHash-Concurrency8-4     6809 ns/op   5406 ns/op   20.60%

Copy link
Contributor

@rscampos rscampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, congrats for the good job :)

@geyslan geyslan merged commit 95d2ff1 into aquasecurity:main Aug 23, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants